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INSTRUCTION SEGMENT RECORDING SCHEME 



BACKGROUND 

The present invention relates to a recording scheme for instruction segments in a 
processor core in which instructions from instruction segments may be cached in reverse 
5 program order. 

FIG. 1 is a block diagram illustrating the process of program execution in a conventional 
processor. Program execution may include three stages: front end 110, execution 120 and 
memory 130. The front-end stage 110 performs instruction pre-processing. Front end 
processing is designed with the goal of supplying valid decoded instructions to an execution 
10 core with low latency and high bandwidth. Front-end processing can include branch prediction, 
decoding and renaming. As the name implies, the execution stage 120 performs instruction 
execution. The execution stage 120 typically communicates with a memory 130 to operate 
upon data stored therein. 

Conventionally, front end processing 110 may build instruction segments from stored 
15 program instructions to reduce the latency of instruction decoding and to increase front-end 
bandwidth. Instruction segments are sequences of dynamically executed instructions that are 
assembled into logical units. The program instructions may have been assembled into the 
instruction segment from non-contiguous regions of an external memory space but, when they 
are assembled in the instruction segment, the instructions appear in program order. The 
20 instruction segment may include instructions or uops (micro-instructions). 

A trace is perhaps the most common type of instruction segment. Typically, a trace may 
begin with an instruction of any type. Traces have a single entry, multiple exit architecture. 
Instruction flow starts at the first instruction but may exit the trace at multiple points, depending 
on predictions made at branch instructions embedded within the trace. The trace may end 
25 when one of number of predetermined end conditions occurs, such as a trace size limit, the 
occurrence of a maximum number of conditional branches or the occurrence of an indirect 
branch or a return instruction. Traces typically are indexed by the address of the first instruction 
therein. 

Other instruction segments are known. The inventors have proposed an instruction 
30 segment, which they call an "extended block," that has a different architecture than the trace. 
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The extended block has a multiple-entry, single-exit architecture. Instruction flow may start at 
any point within an extended block but, when it enters the extended block, instruction flow must 
progress to a terminal instruction in the extended block. The extended block may terminate on 
a conditional branch, a return instruction or a size limit. The extended block may be indexed by 
5 the address of the last instruction therein. 

A "basic block" is another example of an instruction segment. It is perhaps the most 
simple type of instruction segment available. The basic block may terminate on the occurrence 
of any kind of branch instruction, including an unconditional branch. The basic block may be 
characterized by a single-entry, single-exit architecture. Typically, the basic block is indexed by 
1 0 the address of the first instruction therein. 

Regardless of the type of instruction segment used in a processor 110, the instruction 
segment typically is cached for later use. Reduced latency is achieved when program flow 
returns to the instruction segment because the instruction segment may store instructions 
already assembled in program order. The instructions in the cached instruction segment may be 
15 furnished to the execution stage 120 faster than they could be furnished from different locations 
in an ordinary instruction cache. 

While the use of instruction segments has reduced execution latency, they tend to 
exhibit a high degree of redundancy. A segment cache may store copies of a single instruction 
in multiple instruction segments, thereby wasting space in the cache. The inventors propose to 
20 reduce this redundancy by merging one or more segments into a larger, aggregate segment or 
by extending one instruction segment to include instructions from another instruction segment 
with overlapping instructions. However, extension of segments is a non-trivial task, for several 
reasons. 

First, instructions typically are cached in program order. To extend instruction segments 
25 at the beginning of the segment would require previously stored instructions to be shifted 
downward through a cache to make room for the new instruction. The instructions may be 
shifted by varying amounts, depending upon the number of new instructions to be added. This 
serial shift may consume a great deal of time which may impair the effectiveness of the front- 
end stage 110. 

30 Additionally, the extension may destroy previously established relationships among the 

instruction segments. Instruction segments not only are cached, but they also are indexed by 
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the front-end stage 1 10 to identify relationships among themselves. For example, program flow 
previously may have exited a first segment and arrived at a second segment. A mapping from 
the first instruction segment to the second instruction segment may be stored by the front-end 
stage 1 10 in addition to the instruction segments themselves. Oftentimes, the mappings simply 
are pointers from one instruction segment to the first instruction in a second instruction 
segment. 

Extension of instruction segments, however, may cause new instructions to be added to 
the beginning of the segment. In such a case, an old pointer to the segment must be updated to 
circumvent the newly added instructions. If not, if the old mapping were used, the front-end 
stage 110 would furnish an incorrect set of instructions to the execution stage 120. The 
processor 100 would execute the wrong instructions. 

Accordingly, there is a need in the art for a front-end processing system that permits 
instruction segments to be extended dynamically without disruption to previously stored 
mappings among the instruction segments. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram illustrating the process of program execution in a conventional 
processor. 

FIG. 2 is a block diagram of a front end processing system according to an embodiment 
of the present invention. 

FIG. 3 is a block diagram of a segment cache according to an embodiment of the 
present invention. 

FIG. 4 illustrates a relationship between exemplary segment instructions a cache bank 
according to the embodiments of the present invention. 

DETAILED DESCRIPTION 

Embodiments of the present invention provide a recording scheme for instruction 
segments that store the instruction in reverse program order. By storing the instruction in 
reverse program order, it becomes easier to extend the instruction segment to include additional 
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instructions. The instruction segments may be extended without having to re-index tag arrays, 
pointers that associate instruction segments with other instruction segments. 

FIG. 2 is a block diagram of a front end processing system 200 according to an 
embodiment of the present invention. The front end 200 may include an instruction cache 210 
5 and an instruction segment engine ("ISS") 220. The instruction cache 210 may be based on 
any number of known architectures for front-end systems 200. Typically, they include an 
instruction memory (or cache) 230, a branch prediction unit ("BPU") 240 and an instruction 
decoder 250. Program instructions may be stored in the cache memory 230 and indexed by an 
instruction pointer. Instructions may be retrieved from the cache memory 230, decoded by the 
10 instruction decoder 250 and passed to the execution unit (not shown). The BPU 240 may assist 
in the selection of instructions to be retrieved from the cache memory 230 for execution. As is 
known, instructions may be indexed by an address, called an "instruction pointer" or "IP." 

According to an embodiment, an ISS 220 may include a fill unit 260, a segment branch 
prediction unit (or "segment BPU") 270 and a segment cache 280. The fill unit 260 may build the 

15 instruction segments. The segment cache 280 may store the instruction segments. The 
segment BPU 270 may predict which instruction segments, if any, are likely to be executed and 
may cause the segment cache 280 to furnish any predicted segment to the execution unit. The 
segment BPU 270 may store masks associated with each of the instruction segments stored by 
the segment cache 280, indexed by the IP of the terminal instruction of the instruction 

20 segments. 

The ISS 220 may receive decoded instructions from the instruction cache 210. The ISS 
220 also may pass decoded instructions to the execution unit (not shown). A selector 290 may 
select which front-end source, either the instruction cache 210 or the ISS 220, will supply 
instructions to the execution unit. In an embodiment, the segment cache 280 may control the 
25 selector 290. 

According to an embodiment, a hit/miss indication from the segment cache 280 may 
control the selector 290. 

FIG. 3 is a block diagram of a segment cache 300 according to an embodiment of the 
present invention. The segment cache 310 may be populated by a plurality of cache lines 
30 310.1, 310.2, ... 310.N, each of which may store an instruction segment. The segment cache 
310 may be constructed from any number of cache structures, including for example a set- 
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associative cache or a banked cache among others. According to an embodiment, the segment 
cache 300 may output a cache line in response to addressing data (not shown) input to the 
segment cache 300. 

FIG. 4 illustrates a relationship between exemplary segment instructions and the manner 
5 in which they may be stored in a cache line according to the embodiments of the present 
invention. In the example of FIG. 4, two different instruction streams are stored in different 
locations of the instruction cache (FIG. 2, 210). Assume that the first instruction stream extends 
from a location to IP 2 and the second instruction stream extends from location IP 3 to IP 4 . 
Assume further that a conditional branch in the first instruction stream at location IP 5 may cause 
10 program flow to jump to location IP 6 in the second instruction stream. For purposes of this 
example, it also may be assumed that return instructions are located at instruction IP 2 and IP 4 . 
It further may be assumed that the ISS (FIG. 2, 220) does not store any previously created 
instruction segments. 

During execution, a first segment may begin when program flow advances to location \Pj 
15 (as by, for example, a conditional branch). Instructions may be retrieved from the instruction 

cache 210 until the program flow advances to the conditional branch instruction at location IP 5 . 

Assume that the conditional branch is taken, causing program flow to advance to location IP 6 . 

In an extended block system, for example, the conditional branch would cause the instruction 

segment to terminate and a new segment to be created starting at location IP 6 . The first 
20 instruction segment may be stored in a line of the segment cache (say, 310.2 of FIG. 3). 

Program flow may advance from location IP 6 to the return instruction at location IP 4 . The 
return instruction would terminate a second instruction segment 420, causing the ISS (FIG. 2, 
220) to store the second instruction segment 420 in another cache line 440. The instructions 
may be recorded terminal instruction first, then in reverse program order. Thus, the terminal 

25 instruction from location IP 4 may be stored in a first position 440.1 of the cache line 440. The 
instructions may be stored in reverse program order in advancing locations of the cache line 
440 until the instructions are exhausted. In the example of FIG. 4, the instruction at location IP 6 
is shown stored in position 440.9 in the cache line 440. The second instruction segment 420 
need not occupy the full width of the cache line 440. The first instruction segment 41 0, when 

30 stored in the segment cache 300, also may be stored in reverse program order. 
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Assume that program flow advances to the instruction at location IP 3 at some later time. 
Instructions may be retrieved from the instruction cache (FIG. 2, 210) until the program flow 
advances to the return instruction at location IP 4 . The ISS (FIG. 2, 220) may construct a third 
instruction segment 430 extending from location IP 3 to IP 4 . Rather than store the third 
instruction segment 430 in a separate cache line, the ISS 220 instead may extend the second 
instruction segment 420 to include the additional instructions from the third instruction segment 
430. This occurs simply by writing the excess instructions, those from location IP 3 to location 
IP 6 , at the end of the cache line 440 in reverse instruction order. In an embodiment, if the 
second instruction segment is subsumed entirely within the third instruction segment, the fastest 
way of extending the instruction segment is simply to write the third segment 420 into the cache 
line 440. In this embodiment, the instructions of the second segment are overwritten with 
identical data. 

Returning to FIG. 2, as described above, a segment BPU 270 may store addressing 
data for each instruction segment stored in the segment cache 280. Based on instruction flow, 
the segment BPU 270 may predict a next instruction segment to be retrieved from the segment 
cache 280. The segment BPU 270 may output address data to the segment cache 280 to 
cause the cache to output an instruction segment. In this regard, the segment BPU 270 
operates in a manner that may be considered somewhat analogous to the BPU 240. 

The recording scheme of the present invention permits instruction segments to be 
merged without requiring corresponding manipulation of the mappings stored in the segment 
BPU 270. Continuing with the example provided in FIG. 4, when the second instruction 
segment 420 is stored in the cache bank 310, the mapping in the segment BPU 270 may reflect 
the IP of the terminal instruction (IP 4 ) and run length data identifying the number of instructions 
contained in the second segment 420. When the second and third instruction segments 420, 
430 merge, the mapping for the second instruction segment 420 remains valid. Additional 
information may be stored regarding the third instruction segment 430 to identify the IP of the 
terminal instruction (again, IP 4 ) and the length of the instruction segment. Thus, the reverse- 
order-recording scheme provided by the foregoing embodiments facilitates segment extension 
without requiring a re-indexing of previously stored segments. 

Several embodiments of the present invention are specifically illustrated and described 
herein. However, it will be appreciated that modifications and variations of the present invention 
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are covered by the above teachings and within the purview of the appended claims without 
departing from the spirit and intended scope of the invention. 
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WE CLAIM: 



1. An instruction segment comprising a plurality of instructions stored in sequential 
positions of a cache line in reverse program order. 

2. The instruction segment of claim 1, wherein the instruction segment is an extended 
5 block. 

3. The instruction segment of claim 1 , wherein the instruction segment is a trace. 

4. The instruction segment of claim 1 , wherein the instruction segment is a basic block. 

5. A segment cache for a front-end system in a processor, comprising a plurality of cache 
entries to store instruction segments in reverse program order. 

1 0 6. The segment cache of claim 5, further comprising: 

an instruction storage system, 

an instruction segment system, comprising: 

a fill unit provided in communication with the instruction cache system, 

wherein the segment cache is included within the instruction segment system, 

15 and 

a selector coupled to the output of the instruction cache system and to an output of the 
segment cache. 

7. The front-end system of claim 6, wherein the instruction segment system further 
comprises a segment predictor provided in communication with the segment cache. 

20 8. A method for storing instruction segments in a processor, comprising: 
building an instruction segment based on program flow, and 
storing the instruction segment in a cache in reverse program order. 

9. The method of claim 8, further comprising: 

building a second instruction segment based on program flow, and 
25 if the first and second instruction segments overlap, extending the first instruction 

segment to include non-overlapping instructions from the second instruction segment. 
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10. The method of claim 9, wherein the extending comprises storing the non-overlapping 
instructions in the cache in reverse program order in successive cache positions adjacent to the 
instructions from the first instruction segment. 

1 1 . The method of claim 8, wherein the instruction segment is an extended block. 

12. The method of claim 8, wherein the instruction segment is a trace. 

13. The method of claim 8, wherein the instruction segment is a basic block. 

14. A processing engine, comprising: 

a front end stage to build and store instruction segments in reverse program order, and 
an execution unit in communication with the front end stage. 

15. The processing engine of claim 14, wherein the front-end stage comprises: 
an instruction storage system, 

an instruction segment system, comprising: 

a fill unit provided in communication with the instruction cache system, 

a segment cache, and 
a selector coupled to the output of the instruction cache system and to an output of the 
segment cache. 

1 6. The method of claim 1 5, wherein the instruction segment is an extended block. 

1 7. The method of claim 1 5, wherein the instruction segment is a trace. 

1 8. The method of claim 1 5, wherein the instruction segment is a basic block. 

19. The processing engine of claim 15, wherein the extended segment cache system further 
comprises a segment predictor provided in communication with the segment cache. 
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ABSTRACT 



In a front-end system for a processor, a recording scheme for instruction segments 
stores the instructions in reverse program order. Instruction segments may be traces, extended 
blocks or basic blocks. By storing the instructions in reverse program order, the instruction 
segment is easily extended to include additional instructions. The instruction segments may be 
extended without having to re-index tag arrays, pointers that associate instruction segments 
with other instruction segments. 
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I hereby claim the benefit under 35 USC §120 of any United States application(s), or §365(c) of any PCT International application designating 
the United States, listed below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior United 
States or PCT International application in the manner provided by the first paragraph of 35 USC §112, I acknowledge the duty to disclose 
information which is material to patentability as defined in 37 CFR §1 .56 which became available between the filing date of the prior application 
and the national or PCT International filing date of this application: 



Application Number 



Filing Date 



Status (patented, pending, abandoned) 



POWER OF ATTORNEY 

I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and to transact all business in the Patent and Trademark 
Office connected therewith: 

Paul H. Heller (Reg. No. 21,074); John C. Aitmiller (Reg. No. 25,951); Shawn W. O'Dowd (Reg. No. 34,687); Robert L. Hails, Jr. (Reg. No. 
39 702) of KENYON & KENYON with offices located at 1500 K Street NW, Suite 700, Washington, DC, 20005-1257, telephone (202) 220- 
4200, and at 333 W. San Carlos Street, Suite 600, San Jose, CA, 95110-2711, telephone (408) 975-7500; and Alan K. Aldous (#31,905); R. 
Edward Brake (#37,784); Ben Burge (#42,372); Jeffrey S. Draeger (#41,000); Cynthia Thomas Faatz (#39,973); John N. Greaves (#40,362); 
Seth Z Kalson (#40 670)' David J. Kaplan (#41,105); Peter Lam (#44,855); Charles A. Mirho (#41,199); Leo V. Novakoski (#37,198); Thomas 
C Reynolds (#32,488); Kenneth M. Seddon (#43,105); Mark Seeley (#32,299); Steven P. Skabrat (#36,279); Howard A. Skaist (#36,008); 
Gene I. Su (#45,140); Calvin E. Wells (#43,256); Raymond J. Werner (#34,752); Robert G. Winkle (#37,474); and Charles K. Young (#39,435) 
of INTEL CORPORATION. 
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Direct telephone calls to: 

Robert L. Hails, Jr. 
(202) 220-4200 


Send correspondence to: 

KENYON & KENYON 

1500 K Street, NW, Suite 700 

Washington, DC 20005-1257 


1 hereby declare that all statements made herein of my own knowledge are true and all statements maae on inTormation and bent* are Denevea 
c be tru and further that these statements were made with the knowledge that willful false statements and the like so made are punishable 
by fine or impris™ both, under §1001 of Title 18 of the United States Code and that such willful statements may jeopard.ze the validity 
of the aoDiication or anv patent issuing thereon. _ , _. — 


FULL NAME OF 

FIRST 

INVENTOR 


FAMILY NAME 

JOURDAN 


FIRST GIVEN NAME 
Stephan 


SECOND GIVEN NAMb 
J. 


RESIDENCE & 
CITIZENSHIP 


CITY 

Portland 


STATE OR FOREIGN 

COUNTRY 

Oregon 


COUNTRY OF CITIZENSHIP 
FRANCE 


POST OFFICE 
ADDRESS 


POST OFFICE ADDRESS 

14664 NW Rich Court 


CITY 

Portland 


STATE & ZIP CODE/COUNTRY 

Oregon, 97229 


Signature 




Date 


FULL NAME OF 

SECOND 

INVENTOR 


FAMILY NAME 
RONEN 


FIRST GIVEN NAME 

Ronny 


SECOND GIVEN NAME 


RESIDENCE & 
CITIZENSHIP 


CITY 

Haifa 


STATE OR FOREIGN 
COUNTRY 

Israel 


COUNTRY OF CITIZENSHIP 

ISRAEL 


POST OFFICE 
ADDRESS 


POST OFFICE ADDRESS 

11/11 Harduf Street 


CITY 

Haifa 


STATE & ZIP CODE/COUNTRY 

Israel 34747 


Signature 




Date 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 



As a below named inventor; I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am the original, first and sole inventor (if only one name is listed below) or an original, first and joint inventor (if plural names are 
listed below) of the subject matter which is claimed and for which a patent is sought on the invention entitled: 

INSTRUCTION SEGMENT RECORDING SCHEME 

the specification of which is attached hereto unless the following is entered: 



was filed on 


as United States Application Number or 
PCT International Application Number 


and was amended on (if applicable) 









I hereby state that I have reviewed and understand the contents of the above-identified specification, including the claims, as amended by any 
amendment referred to above. 

I acknowledge the duty to disclose information which is material to patentability as defined in 37 CFR §1.56. 

PRIOR FOREIGN APPLICATION(S) 

I hereby claim foreign priority benefits under 35 USC §119(a-d) or §365(b) of any foreign application(s) for patent or inventor's certificate, or 
§365(a) of any PCT International application which designated at least one country other than the United States, listed below and have also 
identified below any foreign applications) for patent or inventor's certificate, or PCT International application having a filing date before that of 



Application Number 


Country 


Filing Date (day/month/year) 


Priority Not Claimed 


None 









PROVISIONAL APPLICATION(S) 

I hereby claim the benefit under 35 USC §1 1 9(e) of any United States provisional application(s) listed below: 



Application Number 



Filing Date 



PRIOR UNITED STATES APPLICATION(S) 

I hereby claim the benefit under 35 USC §120 of any United States application(s), or §365(c) of any PCT International application designating 
the United States, listed below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior United 
States or PCT International application in the manner provided by the first paragraph of 35 USC §112, I acknowledge the duty to disclose 
information which is material to patentability as defined in 37 CFR §1 .56 which became available between the filing date of the prior application 



Application Number 


Filing Date 


Status (patented, pending, abandoned) 









POWER OF ATTORNEY 



I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and to transact all business in the Patent and Trademark 
Office connected therewith: 

Paul H. Heller (Reg. No. 21,074); John C. Altmilfer (Reg. No. 25,951); Shawn W. O'Dowd (Reg. No. 34,687); Robert L Hails, Jr. (Reg. No. 
39,702) of KENYON & KEN YON with offices located at 1500 K Street NW, Suite 700, Washington, DC, 20005-1257, telephone (202) 220- 
4200 and at 333 W San Carlos Street, Suite 600, San Jose, CA, 95110-2711, telephone (408) 975-7500; and Alan K. Atdous (#31,905); R. 
Edward Brake (#37,784); Ben Burge (#42,372); Jeffrey S. Draeger (#41,000); Cynthia Thomas Faatz (#39,973); John N. Greaves (#40,362); 
Seth Z. Kalson (#40,670); David J. Kaplan (#41,105); Peter Lam (#44,855); Charles A. Mirho (#41,199); Leo V, Novakoski (#37,198); Thomas 
C. Reynolds (#32,488); Kenneth M. Seddon (#43,105); Mark Seeley (#32,299); Steven P. Skabrat (#36,279); Howard A. Skaist (#36,008); 
Gene I. Su (#45,140); Calvin E. Wells (#43,256); Raymond J. Werner (#34,752); Robert G. Winkle (#37,474); and Charles K. Young (#39,435) 
Of INTEL CORPORATION. 
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Direct telephone calls to: 

Robert L. Hails, Jr. 
(202) 220-4200 


Send correspondence to: 

KENYON & KENYON 

1 500 K Street, NW, Suite 700 

Washington, DC 20005-1257 


1 hereby declare that all statements made herein of my own knowledge are true and all statements made on information and belief are believed 
to be true; and further that these statements were made with the knowledge that willful false statements and the like so made are punishable 
by fine or imprisonment, or both, under §1 001 of Title 1 8 of the United States Code and that such willful statements may jeopardize the validity 
of the application or any patent issuing thereon. 
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STATE OR FOREIGN 
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COUNTRY OF CITIZENSHIP 
FRANCE 


POST OFFICE 
ADDRESS 


POST OFFICE ADDRESS 

14664 NW Rich Court 


CITY 

Portland 


STATE & ZIP CODE/COUNTRY 
Oregon, 97229 
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Date 
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COUNTRY 

Israel 
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ADDRESS 
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Haifa 


STATE & ZIP CODE/COUNTRY 

Israel 34747 
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